Skip to content

feat(search): paginate search_people via max_pages#527

Open
PierreKasparian wants to merge 3 commits into
stickerdaniel:mainfrom
PierreKasparian:feature/526-search-people-pagination
Open

feat(search): paginate search_people via max_pages#527
PierreKasparian wants to merge 3 commits into
stickerdaniel:mainfrom
PierreKasparian:feature/526-search-people-pagination

Conversation

@PierreKasparian

Copy link
Copy Markdown

Closes #526

Summary

Adds optional pagination to search_people, mirroring the existing search_jobs pattern. Previously search_people returned only the first results page (~10 people) with no way to fetch more.

Changes

  • tools/person.py — new max_pages: Annotated[int, Field(ge=1, le=10)] = 1 parameter, forwarded to the extractor; docstring + log updated.
  • scraping/extractor.pysearch_people loops over LinkedIn's &page=N URL parameter (1-based) up to max_pages, with _NAV_DELAY between navigations. Page texts are joined into the single search_results section with \n---\n; references are deduped by URL across pages.
  • End-of-results detection is locale-independent: pagination stops early when a page beyond the first surfaces no new kind: person references, so over-requesting pages is harmless.
  • Default max_pages=1 preserves the current single-page behavior exactly (opt-in pagination).
  • Tests, README and manifest.json updated.

Test plan

  • uv run ruff format / ruff check / ty check — all pass.
  • uv run pytest — full suite passes (the unrelated test_harden_linkedin_tree_noop_outside_linkedin umask-dependent failure also fails on main).
  • New extractor tests: single-page default, multi-page aggregation + dedup + &page=2 cursor, early-stop when a page adds no new people. New tool test: max_pages forwarding.
  • Verified end-to-end against live LinkedIn.

Synthetic prompt

Add optional pagination to the search_people tool so it can return more than one page of results, following the existing search_jobs max_pages pattern. Add a max_pages parameter (1-10, default 1) to the tool and LinkedInExtractor.search_people, paginate via LinkedIn's &page=N URL parameter, join page texts with \n---\n into the search_results section, dedupe references by URL, and stop early in a locale-independent way when a page surfaces no new kind: person references. Keep default behavior unchanged. Update tests, README and manifest.json.

Generated with Claude Opus 4.8

@PierreKasparian PierreKasparian marked this pull request as ready for review June 20, 2026 13:46
@greptile-apps

greptile-apps Bot commented Jun 20, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds optional pagination to search_people via a new max_pages parameter (1–10, default 1), mirroring the existing search_jobs pattern. Results from multiple pages are joined with \n---\n in the search_results section; references are deduped by URL across pages; and pagination stops early in a locale-independent way when a page returns no new kind: "person" references.

  • extractor.py: New loop over &page=N (1-based) up to max_pages, with _NAV_DELAY between iterations and early-stop when no new person URLs are detected.
  • tools/person.py: max_pages: Annotated[int, Field(ge=1, le=10)] = 1 added, forwarded to the extractor; log line updated.
  • Tests + docs: Three new extractor-level tests (single-page, multi-page join/dedup, early-stop) and one new tool-level forwarding test; README and manifest.json descriptions updated.

Confidence Score: 4/5

Safe to merge with one gap to address: the pagination loop has no exception handler, so a mid-run navigation error discards already-fetched pages.

The pagination logic, dedup, and early-stop are all correct and well-tested. The one gap is that the per-page loop body has no try/except guard equivalent to what search_jobs uses, so an unexpected exception on any page after the first will propagate up and discard all results already accumulated from prior pages.

linkedin_mcp_server/scraping/extractor.py — the search_people pagination loop needs a try/except wrapper matching the search_jobs pattern.

Important Files Changed

Filename Overview
linkedin_mcp_server/scraping/extractor.py Adds pagination loop to search_people mirroring search_jobs, but omits the try/except guard that search_jobs uses to preserve partial results on mid-pagination errors.
linkedin_mcp_server/tools/person.py Adds max_pages parameter with correct Field(ge=1, le=10) constraint, forwards it to the extractor, and updates the log line — clean and correct.
tests/test_scraping.py Three new extractor-level tests cover single-page default, multi-page join + dedup, and early-stop detection; all correctly placed within the existing TestSearchJobs class (a pre-existing class-naming mismatch, not introduced here).
tests/test_tools.py Existing tool tests updated to assert max_pages=1 default and a new forwarding test verifies max_pages=3 is passed through correctly.
manifest.json Description updated to mention multi-page pagination; straightforward documentation change.
README.md Feature table entry updated to reflect the new pagination capability — accurate and minimal.

Reviews (2): Last reviewed commit: "Merge branch 'main' into feature/526-sea..." | Re-trigger Greptile

Comment thread linkedin_mcp_server/scraping/extractor.py
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Paginate search_people to retrieve more than 10 results

1 participant